lib: nexthops compare vrf only if ip type#5368
lib: nexthops compare vrf only if ip type#5368tylerlinp wants to merge 1 commit intoFRRouting:masterfrom
Conversation
polychaeta
left a comment
There was a problem hiding this comment.
Thanks for your contribution to FRR!
- One of your commits is missing a
Signed-off-byline; we can't accept your contribution until all of your commits have one - One of your commits does not have a blank line between the summary and body; this will break
git log --oneline
If you are a new contributor to FRR, please see our contributing guidelines.
| switch (next1->type) { | ||
| case NEXTHOP_TYPE_IPV4: | ||
| case NEXTHOP_TYPE_IPV6: | ||
| if (next1->vrf_id < next2->vrf_id) |
There was a problem hiding this comment.
what if vrf is used not only with IPv4/IPv6 nexthop, but nexthop as interface?
There was a problem hiding this comment.
If ifindex given, vrf is a redundant term, actually it is get from ifindex, further more interface may change vrf, then RTM_DELROUTE handling failed, netlink message doesn't contain nexthop vrf either.
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
To compare nexthops, if given ifindex, it is enough to compare ifindex, the vrf is get from ifindex, and ifindex is more reliable. For blackhole, I think it is a special interface, vrf may be useless. So only type ip need to compare vrf. Signed-off-by: Tyler Li <[email protected]>
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9768/ This is a comment from an automated CI system. clang_check |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9769/ This is a comment from an automated CI system. clang_check |
riw777
left a comment
There was a problem hiding this comment.
It seems, to me, that moving this code would allow processes to delete routes they shouldn't be deleting... I think this is the wrong place to fix this problem.
|
Okay, I took a couple hours to look at this one. So this is only an issue with kernel routes. And it seems this specific case is fixed with current master with all cases but It was most likely fixed due to #5184 since we reprocess the kernel routes on the interface event. However, with nexthops of type Therefore, the fix might actually be to do this (untested): BUT we do have a second bug actually: We don't handle deletes from the kernel when the nexthop vrf is specified, regardless of whether it changed. and its still present in the rib: and this seems to be because we are not passing the nexthop's vrf_id into which is not being set in the netlink code. The |
|
So two commits are probably needed to fix this one.
That should resolve the issue without having to change the nexthop comparison function. |
The root problem is why we need vrf_id if nexthop ifindex given? We know if only nexthop ip given, we need vrf_id for lookup the outer interface, and then send nexthop ip and outer interface in netlink. (We can see the function nexthop_same_firsthop, what we really need is nexthop ip and outer interface)But if ifindex given, it is enough for us, the vrf_id is redundant. Keep a vrf_id in this type only makes us do more works useless or harmfully(if we doing something wrong like this problem). So I think Tyler‘s modifies is better. |
|
@ryan44guo I'm not confident enough to answer whether the We no longer receive explicit route deletes from the kernel on interface events that cause routes to be removed (they are silently deleted and we are expected to know). ex) This is with this patch added to current master on a 4.15 kernel: The route with nexthop of type |
|
@sworleys Yes, you are right. My patch can not fix the issue in your example. Then kernel doesn't send RTM_DELROUTE. |
bisdhdh
left a comment
There was a problem hiding this comment.
I too think it is not the correct place to fix this issue. It might be fixing one issue but opening ends for other issues to come.
If the kernel doesn't send DELROUTE, this is the kernel's bug, it should send DELROUTE to netlink if it had sent ADDROUTE to. That is 2 separated problem, although this modifier can make this route inactive. But make route inactive is not equal to remove route, so that is not a correct workaround. We can assume if the interface bind back to the origin vrf, in frr logic, the route can active again, but does kernel do the same thing? To the root reason, it is a vrf+route thing, it can not be repaired by nexthop vrf stuff. |
Can you give an example? We only have an example that processes not delete routes which should be deleted now. |
Actually, no it was intentionally done in the kernel to limit route notifications on interface events from what I have been told. |
If so, we can assume that is their design, although I doubt that is a good design. You know what they did is removing static routes which nexthop point to down interface, so when interface up, it can not been added back. I think that is not what we hope to do. When change upper(vrf), if seems a down and an up, so the static route is lost. |
|
@ryan44guo I added in #5184 to handle the lack of explicit RTM_DELROUTE for kernel routes. I shared my suggestions earlier to add to this patch so that we can handle the vrf changing case. I tested it and it seems to work on my 4.19 kernel.
If we add (2) I think we will no longer need tyler's change with the vrf_id though. |
|
It might be worth discussing this in our slack instead. If you aren't a member, please join by clicking the slack icon and self-inviting yourself https://frrouting.org/#participate We can discuss this further in the #development channel or a private group one with everyone |
|
|
partially fixed by #5553 |
Using frr7.2, I encounterd a problem that zebra crashes when removing vrf(I would create an issue). And found when changing an interface vrf, route ff00::/8 deletes failed because of nexthop vrf changed.
To compare nexthops, if given ifindex, it is enough to compare ifindex, the vrf is get from ifindex, and ifindex is more reliable. For blackhole, I think it is a special interface, vrf may be useless. So only type ip need to compare vrf.
logs: